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Introduction 



Investigations of differential item performance- the problem of 
identifying items on which the performance of subpopulations is in some way 
r40t consistent with their performance on other items- has had a long 
history. Berk (1982), Shepard, Camilli and Averill (1981), Lord (1980) and 
Scheuneman (1979) discuss various approaches to identifying items that seem 
to "work" differently for various subgroups. The methods that have been 
proposed have often been referred to as "item bias" procedures. The more 
psychometrically acceptable procedures incorporate the notion of comparing 
the item performance of members of different subgroups who are in some sense 
comparable . 

\Jhen comparing two populations on any criterion, it is important that 
only comparable members of the two groups be compared. For example, 
comparing Blacks and Whites' item performance when the groups' members may 
have come from different parent distributions with respect to grade in 
school, would lead to a meaningless contrast. A better approach would be to 
insure that only Black third graders were being compared to White third 
graders etc. What are the most important control variables depends to a 
certain extent on the purpose of the group contrasts. 

The typical "item bias" approach to insuring comparability of group 
members is to use the total test score as a control variable. The question 
becomes: Do individuals from differing groups who have the same total test 
score perform the same way on a given item? If there is differential item 
performance under this control condition, then that wlifference in 
performance is interpreted as a difference attributable to characteristics 
of the particular item and not due to differences in the characteristics of 
the individuals as measured by the total score. If we do not control for 
the total score, then the finding of differential item difficulty confounds 
examinee characteristics with item characteristics and we are simply 
measuring impact and not differential item functioning (DIF) . It should be 
kept in mind that this definition of Differential Item Functioning assumes 
that the mathematics test is unidimensional Factor analysis of the NAEP 
mathematics items suggest that this is the case. 

Marascuillo and Slaughter (1981) suggested that a good way of looking 
at differential item functioning would be to use multi-way contingency 
t:ables where at least one of the classification variables would be the 
control variable. Holland (1985) in the same spirit suggested the use of 
the Mantel and Haenszel's (1959) contingency table analysis procedure for 
identifying items with differential item functioning (DIF) characteristics. 
It was a modification of the Mantel -Haenszel procedure as suggested by 
Holland that was used here. 
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The Mantel-Haenszel Procedure 



The Mantel-Haenszel (MH) procedure first divides the reference group, 
say native English speakers, and the focal group, say Spanish speaking 
Mexican-Americans into subsets that are matched on the total test score 
before their performance on the items is compared. For any given item a 
matched subset can be formed, say individuals from either group that scored 
ten correct on the total test score, and then a 2 x 2 table can be formed 
where one dimension of the table is the two groups being compared and the 
remaining dimension is whether the individuals got the item right or wrong. 
The cell entries in the 2x2 table are the frequencies of rights and wrongs 
for the focal and reference group. For example the following table might 
reflect the frequencies for the two groups on the first item for those 
individuals who had a total ccores of 10. 

Rights Wrongs 

Reference (White) a b 

Focal (Ethnic) c d 



The odds that a reference group member gets the item correct is a/b while 
the corresponding odds for the focal group member is c/d. The MH procedure 
measures the advantage (or disadvantage) that the reference group members 
have to their matched couucerparts in the focal group by the ratio of their 
respective odds. This gives us the odds ratio estimate 

= (a/b) / (c/d) (1) 

where estimates the population odds rs.tio for the ith matched group on 
this particular item. If the odds ratio in (1) is much greater than 1.0, 
then we infer that the i';em favors the reference group. Conversely if it is 
significantly less than 1.0 then we infer that it favors the focal group. 
For a given item the Mantel-Haenszel estimate a^^ is a weighted average of 
the odds ratios taken across all k matched groups where the weights depend 
on a proportionality function at each matched total score level. 

k b.c. 

That is = i= ^ (2) 

i«l i 

Where T^ = total numbers of individuals in the ith matched total score 
level. 

Associated with the estimate a is a chi-square test with one degree of 
freedom. The hypothesis being tested here is that all the odds ratios in 
the k matched subsamples for a given item are unity. 



Sample 



The data for this analysis come from the 1985-86 NAEP regular 
assessmenj and from the NAEP special supplemental study of language minority 
students. All White students in NAEP who received math block 9M4 in third 
grade, block 13M7 in seventh grade, and block 17M8 in eleventh grade are 
contrasted with Asian, Mexican American, Puerto Rican |nd Cuban students in 
the supplemental study who received those same blocks. Table 1 presents 
the sample for this rtudy. 

Table 1 
Sample for this Study 





Grade 3 


Grade 7 


Grade 11 


White 


1367 


1570 


1580 


Asian 


265 


613 


760 


Mexican American 


1238 


1602 


1022 


Puerto Rican 


566 


624- 


458 


Cuban 


292 


347 


566 



For a detailed discussion of the sampling procedures in 
NAEP and the special supplemental study see: Johnson, E. , 
Kline, D., Norris, N. , & Rogers, A. (1987). National 
assessment of educational progress 1985-1986 public use 
tapes: Version I users' guide . Princeton, NJ: Educational 
Testing Service, and National Assessment of Educational 
Progress (1988). Findings from the NAEP 1985-86 special 
study: the educational progress of language minority 
children . Princeton, NJ: Educational Testing Service. 



Although the supplemental study included a sample of 
Native Americans, their numbers were too small in each grade 
to generate reliable estimates in a DIF analysis. For this 
reason they are not included in this study. Appendix A' 
presents the items used in this analysis. 
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Results 



Table 2 presents the third grade mathematics results when the Asians, 
Mexican- Americans, Puerto Ricans, and Cubans are each compared with the 
White reference group who took the same block of mathematics items. 

(Table 2 about here) 

The three columns under each ethnic group show the effect size, the 
Mantel -Haenszel chi-square, and the odds ratio respectively. As indicated 
in the table when the odds ratio is less than 1. and the effect size is 
positive, then there is some suggestion that the item is favoring the 
particular focal group, i.e., the ethnic group is doing better on average on 
this item than expected from their total test score. Conversely, if the 
odds ratio is greater than 1. and the effect size is negative, than there is 
some suggestion that the item is favoring the reference group (Whites). 
Items that are marked with an asterisk show statistically significant 
differential item functioning. However, in order to protect against 
possible over- interpretation in the presence of repeated statistical tests, 
we will only attempt to interpret those differences that are statistically 
significant and have an absolute effect size of 1.5 or greater. These 
effect sizes are on a scale of difference of item difficulty as measured by 
the ETS delta scale (Holland & Thayer, 1985). One can loosely interpret a 
differential item function difference (effect size) of 1.5 as one and half 
standard deviations difference on the delta scale. It has been ETS' 
experience that differences in differential performance of this magnitude 
often can be explained by the content or cognitive demand required to solve 
the item. 

Inspection of Table 2, the third grade results, indicates that items 1 
and 2 are both differentially easy for the ethnic groups (with the possible 
exception of the Mexican-American group on item 1), while conversely item 12 
is differentially hard for the ethnic groups when compared to the White 
reference group. Inspection of items 1 and 2 indicate that they are simple 
single operation arithmetic (subtraction of whole niombers) problems with no 
text involved. Conversely item 12 is a more difficult item and has 
considerable text in the stem. In addition item 12 requires a basic notion 
of the concept of probability. One cannot separate how much of this 
differential difficulty is due to the vocabulary in the text and how much 
may be due to lack of exposure to the concept of probability. 

Table 3 presents the reference -focal group contrasts for the seventh 
grade mathematics items . 

(Table 3 about here) 

Inspection of Table 3 indicates that only two items meet both criteria- 
statistical significance and effect sizes whose absolute values are equal to 
or greater than 1.5» Given these criteria. Item 8 would appear to be 



ERLC 



6 



6 

differentially hard for the Asians when compared to the \7hite reference 
group. Conversely item 1 appears to be differentially easy for the Pueruo 
Ricans. Inspection of item 8 indicates a somewhat heavier load of textual 
material than many of the other items. While not significant other text 
based items such as items 2, 3, and 7 also have negative effect sizes 
suggesting that the text may indeed make the item differentially difficult 
for Asians . 

Item 1 appears to favor the Puerto Ricans, and it indeed is a simple 
arithmetic operation with a small amount of text involved. But, the text is 
quite easy in terms of familiar vocabulary and the item solution only 
requires simple arithmetic. Overall there does not appear to be any 
systematic bias against the ethnic groups that can be explained by a 
language problem. If one simply counts the number of statistically 
significant comparisons, there are 20 favoring members of ethnic groups and 
14 favoring the reference group. 

Table 4 presents the reference-focal group comparisons on the eleventh 
grade mathematics test. 

(Table 4 about here) 

Inspection of the Table 4 results suggest that items 28 and 25 seem to 
be differentially difficult for most if not all the ethnic groups. 
Inspection of the table suggests that many of the negative effects sizes 
occur among the more difficult items. Many of these items would be 
difficult if one were less likely to be exposed to the concepts involved. 
For example, item 25 requires an understanding of the concept of 
probability, while item 28 requires a basic knowledge of geometry. It may 
well be that at least some members of the above ethnic groups are less 
likely to be exposed to curriculum that covers probability concepts than are 
their V?hite counterparts. Item 28 requires knowledge of both a basic 
definition in geometry as well as an inference about what the item writer 
means by the term "combined" in the particular context of the problem. It 
would seem that item 28 is differentially hard for all ethnic groups due to 
this inference from the text. 

It is interesting to note that one item (item 4) that was 
differentially easy for Mexican- Americans and Puerto Ricans ::.onsisted of 
reading and making a selection among graphs. It is not clear why this is 
the case excepc there is very little text involved in the pictorial 
presentation. 
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Conclusions 



With the exception of the third graders, no simple explanation can be 
put forth with any confidence with respect to why some items were 
differentially difficult or differentially easy for the ethnic group 
members. However, in the case of the third graders, items which had little 
oi no text and involved simple arithmetical operations were differentially 
easier for many of the ethnic group members. In general, however, there was 
no consistent evidence for differential difficulty at any grade level. 
That is, there tended to be as many items favoring the language groups as 
there were items favoring the majority group at each grade level. 
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Tabic 2 



WHITE ITEM PERFORMANCE REFERENCE GROUP CONTRASTED WITH THAT OF EACH OF FOUR LANGUAGE 

(FOCAL) GROUPS AT GRADE 3 



Asians Mexican Americans Puerto Rir'jns Cuban Americans 

N=265 N=1238 '.^566 N=292 

Effect^ M-H Odds^ Effect^ M-H Odds^ Effect^ M-H Odds^ Effect^ M-H Odds^ 

Items Size Ratio Size x^ Ratio Size x^ Ratio Size x^ Ratio 



1 


2.26 


12.56* 


.38 


1.31 


21.56* 


.57 


2 


U92 


9.2A* 


.44 


1.67 


32.37* 


.49 


3 


1.9A 


18.53* 


.44 


0.56 


5.94* 


.79 


A 


-0*10 


0.01 


1.04 


0.46 


3.17 


.82 


5 


2.26 


15.20* 


.38 


0.91 


12.51* 


.68 


6 


-0,25 


0.20 


1.11 


0.43 


2.94 


.83 


7 


0.07 


0.01 


0.97 


-0.04 


0.02 


1.02 


8 


0.04 


0.00 


0.98 


0.A3 


3.57 


0.83 


9 


-0.95 


4.42* 


1.50 


-1.11 


20.66* 


1.60 


10 


-0.70 


2.65 


1.34 


-1.10 


^1.06* 


1.59 


11 


-1.15 


6.18* 


1.63 


-0.57 


5.25* 


1.27 


12 


-2*3A 


31.96* 


2.71 


-1.58 


37.48* 


1.95 


13 


0*3A 


0.65 


0.86 


0.26 


1.13 


0.89 




0.38 


0.74 


C.85 


0.61 


4.85* 


0.77 


15 


-1.02 


3.89* 


1.55 


0.03 


0.00 


0.99 


16 


O.AO 


0.51 


0.85 


-0.59 


2.23 


1.29 


17 


0.82 


0.44 


0.71 


0.59 


0.63 


0.78 


18 


-0*49 


1.35 


1.23 


-1.10 


25.92* 


1.59 



2.84 


48.10* 




.30 


2.51 


24.14* 


0.34 


1.77 


21.91* 




-47 


1.78 


14.49* 


0.47 


0.30 


0.96 




.88 


0.92 


5.95* 


0.68 


0.59 


3.35 




.78 


0.28 


0.44 


0.89 


0.74 


4.90* 




.73 


0.63 


2.17 


0.76 


0.45 


1.88 




.83 


0.32 


0.53 


0.87 


-0.91 


10.39* 


1 


.47 


-0.32 


0.63 


1.15 


0.75 


5.54* 




.73 


0.^4 


0.31 


0.90 


-2.01 


38.79* 


2 


.35 


-1.19 


7.87* 


^.(\6 


-1.29 


14.64* 


1 


.73 


-1.57 


14.23* 


1.94 


-0.80 


5.69* 


1 


.41 


-0.65 


2.24 


1.32 


-1.54 


18.03* 


1 


.92 


-2.01 


19.37* 


2.35 


1.14 


11.63* 




.62 


-0.31 


0.ij3 


1.14 


1.34 


14.41* 




.57 


1.03 


5.94* 


0.64 


0.07 


0.01 




.97 


0.06 


0.00 


0.97 


-1.30 


4.29* 


1 


.74 


-0.77 


1.44 


1.38 


0.33 


0.03 




.87 


0.40 


0.03 


0.84 


-1.43 


26.38* 


1 


.84 


-0.54 


2.^7 


1.26 



If the effect size is > 1.5, i^ is interpreted as a practical difference in favor of the focal m.nority group. The 
relationship is symmetric and an effect size f. -1.5 would indicate favoring the White reference group. 
^ Odds ratios >1.0 suggest the item is favoring the reference group, odds ratios <1.0 suggest the ->pposite. 
Significant H-H at the alpha .05 level or better. 
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Tabic 3 



WHITE ITEM PERFORMANCE REFERENCE GROUP CONTRASTED WITH THAT OF EACH OF FOUR LANGUAGE 

(FOCAL) GROUPS AT CRAOE 7 



Asians 

N::613 



Mexican American 
N=1602 



Puerto Ricans 
N=:624 



Cuban Americans 
N=3A7 



Effect 
It^ips size 



M-H 

v2 



Odds2 
Ratio 



Effect^ H- 



Size 



Odds' 
Ratio 



Effect^ 
Size 



M-P 
v2 



Odds2 
Ratio 



Effect' 
Size 



M-H 

v2 



Odds'' 
Ratio 



1 


0.54 


1.13 


0.80 


1 


.25 


18.42* 


0.59 


2 


-0.61 


3.10 


1.30 


-0.39 


3.27 


1.18 


3 


-0.15 


0.33 


1 .07 


0 


.25 


1 .33 


0.90 




0.A2 


1.59 


0.84 


0 


.56 


7.06* 


0.79 


5 


0.00 


0.00 


1.00 


-1 


.09 


26.82* 


1.59 


6 


0.00 


0.00 


1-00 


0 


.1; 


0.27 


0.96 


7 


-0.72 


4.96* 


1.36 


-1 


.23 


33.56* 


1.68 


8 


-1.51 


32.08* 


1.90 


-0 


.74 


13.42* 


1.37 


9 


-0.09 


0.06 


1.04 


-0 


60 


10.01* 


1.29 


10 


O.U 


0.18 


0.94 


0 


12 


0.33 


0.95 


11 


0.12 


U07 


0.95 


-0 


28 


1.54 


1.13 


12 


-0.21 


0.52 


1.09 


-0 


07 


0.10 


1.03 


13 


0.70 


5.50"* 


0.74 


0 


60 


8.70* 


0.77 


JA 


0.91 


8.07* 


0.68 


0.84 


13.67* 


0.70 


15 


1.03 


10.65* 


0.65 


0 


81 


13.05* 


0.71 


16 


0.98 


8.36* 


0.66 


0. 


99 


19.60* 


0.66 


17 


-0.57 


3.74 


1.28 


-0.46 


4.37* 


^1.22 


18 


-0.05 


0.01 


1.02 


-0. 


37 


2.55 


1.17 


19 


0.64 


5.06* 


0.76 


0. 


46 


3.89* 


0.82 


20 


0.10 


0.10 


0.96 


-0. 


14 


0.39 


1.06 


21 


-0.47 


2.21 


1.22 


0. 


42 


3.57 


0.83 


22 


-0.35 


1.36 


1.16 


-0. 


18 


0.56 


1.08 



1 .62 


17.84* 


0.50 


0 79 




0.71 


-1 .21 


19.20* 


1 .67 


0.19 


n 


n Q? 

u . y c 


0.13 


0. 24 




0 .07 


n n"^ 

U . \}J 




0.58 


4.46* 


0.78 


0.84 


5.60* 


0.70 


-1.41 


22.66* 


1.82 


-0.53 


2.26 


1.25 


0.12 


0.18 


0.95 


0.30 


0.71 


0.68 


-0.38 


1.76 


1.18 


-0.20 


0.25 


1.09 


-1.10 


15.86* 


1.60 


-0.95 


7.49* 


1.50 


-0.46 


3.33 


1.22 


-0.64 


4.25* 


1.31 


0.07 


0.05 


0.97 


-0.18 


0.24 


1.08 


0.59 


3.82 


0.78 


0.63 


2.29 


0.76 


-0.26 


0.96 


1.12 


-0.37 


1.33 


1.17 


1.08 


14.79* 


0.63 


0.62 


2.66 


0.77 


1.08 


11.90* 


0.63 


0.42 


1.06 


6.84 


0.99 


10 02* 




C.73 


3.25 


0.73 


1.05 


11.54* 


0.64 


-0,03 


0.00 


1.01 


-0.14 


0.15 


1.06 


-0.56 


2.00 


1.27 


-0.12 


0.12 


1.05 


-0.10 


0.04 


1.04 


C.91 


8.64* 


0.68 


0.47 


1.44 


0.82 


-1.14 


13.32* 


1.62 


-0.10 


0.05 


1.04 


-0.07 


0.03 


1.03 


-0.03 


0.01 


1.01 


-0.95 


8.72* 


1.50 


-0.53 


1.79 


1.25 



^ If the effect size is > 1.5, it. is interpreted as a practical difference in favor of the focal minority group. The 

relationship is symmetric and an effect size < 1.5 would indicate favoring the White reference yroup. 
2) ds ration >1.0 suggest the item is favoring the reference group, odds ratios -^1.0 suggest the opposite. 
El\KJnif icant M-H at the alpha .05 level or better. 



Tablo A 



WHITE ITEH PERFORMANCE REFERENCE GROUP CONTRASTED WITH THAT OF EACH OF FOUR LANGUAGE 

(FOCAL) GROUPS AT GRADE 11 







As i ans 




Mexican 


Americans 


Puerto Ricans 




Cuban Americans 






N~760 




N= 


1022 




N 


=458 






N=566 






Effect 


^ H-H 


Odds^ 


Effect^ 


H 


-H 


Odds 


Effect 


M-H 


Odds^ 


Effect 


H-H 


Odds^ 


I tems 


Size 


x2 


Ratio 


S i ze 




Ratic 


Size 


v2 


Ratio 


Size 


2 
x*^ 


Ratio 


1 


0.07 


0.00 


0.97 


0.47 


1 


.56 


0.82 


-0.26 


0.25 


1.11 


0.41 


0.50 


0.84 


2 


0.11 


0.07 


0.96 


-0 . 04 


0 


.01 


1.02 


0.36 


1.C2 


u . oo 


-0.29 


0.67 


1.13 


3 


1 .31 


16. K* 


0.57 


-n 55 


4 


.84* 


1.26 


0.26 


0.59 


n fio 

U • Of 


0.28 


0.67 


0.89 




0.97 




0.66 


C . 1 u 


31 


.49* 


0.40 


2.28 


18.51* 


U . JO 


0.88 


3.76 


0.69 


5 


0.16 


0.34 


0.93 


0.13 


0 


.26 


0.95 


0.40 


1.64 


U . OH 


-0.01 


0.00 


1.01 


6 


0.62 


1 .68 


0.77 


0 . 01 


0 


.00 


0.99 


0.39 


0.73 


n fis 

u . 


0.27 


0.26 


0.89 


7 


2.28 


50.20* 


0.38 


0.70 


8.02* 


0.74 


1.25 


14.48* 
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^ If the effect size is > 1.5, it is interpreted as a practical difference in favor of the focal minority group. The 

relationship is symmetric and an effect size < -1.5 would indicate favoring the White reference group. 
^ Odds ratios >1.0 suggest the item is favoring the reference group. Odds ratios <1.0 suggest the opposite. 
*<:iom#icant M-H x^ at the alpha .05 level or better. 
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APPENDIX A 



Note Bene: These items are not to be circulated. They have not been 
released to the public. 
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